{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第12章 排序与选择"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "归并排序实现："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [],
   "source": [
    "def merge_sort(sequence, start, stop):\n",
    "    if start == stop:\n",
    "        return [sequence[start]]\n",
    "    middle = (start + stop) // 2\n",
    "    sequence1 = merge_sort(sequence, start, middle)\n",
    "    sequence2 = merge_sort(sequence, middle + 1, stop)\n",
    "    result = []\n",
    "    i = 0\n",
    "    j = 0\n",
    "    while i < len(sequence1) or j < len(sequence2):\n",
    "        if i == len(sequence1):\n",
    "            result.extend(sequence2[j:])\n",
    "            j = len(sequence2)\n",
    "        elif j == len(sequence2):\n",
    "            result.extend(sequence1[i:])\n",
    "            i = len(sequence1)\n",
    "        elif sequence1[i] < sequence2[j]:\n",
    "            result.append(sequence1[i])\n",
    "            i += 1\n",
    "        else:\n",
    "            result.append(sequence2[j])\n",
    "            j += 1\n",
    "    return result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[4, 10, 11, 13, 21, 53, 234]\n"
     ]
    }
   ],
   "source": [
    "print(merge_sort([10, 21, 13, 4, 234, 11, 53], 0, 6))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里注意递归的时间复杂度分析：只看非递归部分的时间复杂度和递归次数。只看非递归部分的原因是：随着问题规模的缩小，问题规模大的递归部分的时间复杂度是由问题规模小的非递归部分的时间复杂度组成的，因此并不是没有计算递归部分的时间复杂度，而是递归部分的时间复杂度化为了更小规模问题的非递归部分时间复杂度，然后最后到达基线条件，就不存在递归部分的时间复杂度了，全部化为了非递归部分。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "归并排序的空间复杂度：如果没有start和stop，那么log(n)层每一层都需要O(n)，额外空间达到O(nlog(n))，有了start和stop之后，在函数调用栈中，S只占了O(n)的空间，之后result在栈的每一层占O(n)，但是用完就不占了，下一层再O(n)，所以总共O(n)。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "快速排序非就地算法实现："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "def quick_sort(S):\n",
    "    if len(S) <= 1:\n",
    "        return S\n",
    "    S1 = []\n",
    "    S2 = []\n",
    "    S3 = []\n",
    "    for i in S:\n",
    "        if i < S[0]:\n",
    "            S1.append(i)\n",
    "        elif i == S[0]:\n",
    "            S2.append(i)\n",
    "        else:\n",
    "            S3.append(i)\n",
    "    return quick_sort(S1) + S2 + quick_sort(S3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[4, 10, 11, 13, 21, 53, 234]\n"
     ]
    }
   ],
   "source": [
    "print(quick_sort([10, 21, 13, 4, 234, 11, 53]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "快速排序就地算法：一般直接改变序列，而且只使用一点额外的空间。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "def inplace_quick_sort(S, a, b):\n",
    "    if a >= b:\n",
    "        return\n",
    "    z = S[b]                             ## 作为分界值\n",
    "    left = a\n",
    "    right = b - 1\n",
    "    while left <= right:\n",
    "        while left <= right and S[left] < z:\n",
    "            left += 1\n",
    "        while left <= right and S[right] > z:\n",
    "            right -= 1\n",
    "        if left <= right:\n",
    "            S[left], S[right] = S[right], S[left]\n",
    "            left, right = left + 1, right - 1\n",
    "    S[left], S[b] = S[b], S[left]\n",
    "    inplace_quick_sort(S, a, left - 1)\n",
    "    inplace_quick_sort(S, left + 1, b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[4, 10, 11, 13, 21, 53, 234]\n"
     ]
    }
   ],
   "source": [
    "mylist = [10, 21, 13, 4, 234, 11, 53]\n",
    "inplace_quick_sort(mylist, 0, 6)\n",
    "print(mylist)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "归并排序和快速排序对比:\n",
    "\n",
    "1. 归并排序比快速排序稳定，归并排序在递归时，问题规模缩小的速度是确定的，每次缩小一半，归并排序树的高度一定为log(n)，而且每一层需要的时间一定是O(n)，而快速排序树的高度可能为O(n)，尤其是序列有序的时候，本来应该更快，结果更慢了。\n",
    "2. 快速排序有就地算法，归并排序没有，归并排序需要O(nlog(n))的额外空间，快速排序有就地算法，只需要O(log(n))，顺便提一下，堆排序只需要O(1)。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "随机快速排序：基准值不是固定索引，而是随机抽取。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "基于比较的排序，需要比较各个值的大小来确定位置，算法的时间复杂度不会比O(nlog(n))更快。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "桶排序：待排序的序列长度为n，取值范围是不大于N的自然数，那么只需要O(n + N)的时间就能排好序。算法：新建长度为N的数组，遍历待排序的序列，将值v存储到索引v，然后依次pop即可。（桶排序不是基于比较的排序）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "稳定排序：对于相等的值，在排序后序列中的前后位置与排序前序列中的前后位置保持一致。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "显然桶排序是稳定排序。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "基数排序：用于序列间的排序（如字符串之间的排序），相当于使用多次桶排序，排序的方法是字典序，即先比较第一个元素，没结果比较第二个元素，以此类推。基数排序实现方法：先对最后一个元素使用桶排序，然后倒数第二个元素使用桶排序，然后一直到第一个元素使用桶排序，所以如果有n个长度为m的序列需要排序，取值范围限制为N，那么时间复杂度为O(m(n + N))。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "桶排序稳定的作用：在基数排序中，从后往前进行桶排序，如果前面的桶排序都比不出结果，那么基于桶排序稳定的作用，会根据第一个能比出结果的桶排序的顺序对相等的元素进行排序，正好是字典序的定义——根据第一个不相等的元素排序。更好的理解方式：从前往后，m个元素，如果前i个都相等，那么字典序来看应该取决于第i + 1个，从桶排序的顺序来看也是如此，桶排序从后往前，从m - 1的位置到达i + 1的位置，因为i + 1的元素不相同，所以是根据i + 1确定的序列前后顺序，然后i + 1往前的桶排序因为是稳定的，且元素都相等，因此都不会改变这两个序列的相对顺序，所以这两个序列的顺序是根据i + 1个元素来的，正好符合字典序。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "稳定排序与字典序：从后往前的稳定排序的思想，正好是如果前面能区分，那么覆盖原来的排序，如果不能区分，根据稳定排序，由原来的排序来决定顺序。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "基数排序的延伸——基于自己的思考：对于字典序的实现，使用从后往前的多次稳定排序即可。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "插入排序：插入排序的运行时间是O(n + m)，m是逆序的数量，逆序的数量是原始序列中，每个元素之前有多少个比它大的，所有元素的逆序数之和为m。在比较有序的序列中，插入排序的表现是很好的，但在完全逆序的情况下，插入排序的表现非常大。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Python的内置排序函数**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "list有sort方法，可以直接改变list的顺序，也有sorted函数，作用于list产生一个新的已排序的list。排序可以增加键函数，对将每个元素输入键函数，输出的值作为该元素的键，然后根据键进行排序。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 选择"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "选择：选出第k大或者第k小的元素。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "剪枝搜索：二分查找借助了这个思路，即每次都会剪掉很大一部分的序列，因为输出的值不会从剪掉的序列中产生。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**随机快速选择**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "基于快速排序的思想：从序列sequence中选出第k大的元素，首先选择一个基准值，如果小于基准值超过k个，那么剪枝剪掉大于等于基准值的，在小于基准值的序列中继续选出第大的元素，如果小于基准值的没有超过k个，但是小于等于基准值的超过或者等于k，那么第k大的就是基准值，如果小于等于基准值的不超过k个，说明sequence第k大的就是大于基准值的序列中第k - m大的，m是小于等于基准值的元素个数。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "随机：随机选择基准值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [],
   "source": [
    "def quick_select(sequnce, k):\n",
    "    if len(sequnce) == 1:\n",
    "        return sequnce[0]\n",
    "    S1 = []\n",
    "    S2 = []\n",
    "    S3 = []\n",
    "    for i in sequnce:\n",
    "        if i < sequnce[0]:\n",
    "            S1.append(i)\n",
    "        elif i == sequnce[0]:\n",
    "            S2.append(i)\n",
    "        else:\n",
    "            S3.append(i)\n",
    "    if len(S1) >= k:\n",
    "        return quick_select(S1, k)\n",
    "    elif len(S1) + len(S2) >= k:\n",
    "        return S2[0]\n",
    "    else:\n",
    "        return quick_select(S3, k - len(S1) - len(S2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "11\n"
     ]
    }
   ],
   "source": [
    "print(quick_select([10, 21, 13, 4, 234, 11, 53], 3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "随机快速选择的时间复杂度：期望为O(n)。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 练习"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "R-12.4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "归并排序可以是稳定排序也可以是不稳定的，对于是否稳定，如果在构建result的时候，左边的小于等于就放进result的，那么是稳定的，但如果是小于就放进去，等于是右边的，那么就是不稳定的。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
